Noisy Data Make the Partial Digest Problem NP - hardTECHNICAL

نویسندگان

  • Mark Cieliebak
  • Stephan Eidenbenz
  • Paolo Penna
چکیده

The Partial Digest problem { well-known for its applications in computational biology and for the intriguingly open status of its computational complexity { asks for the coordinates of n points on a line such that the pairwise distances of the points form a given multi-set of ? n 2 distances. In an eeort to model real-life data, we study the computational complexity of a minimization version of Partial Digest, in which only a subset of all pairwise distances is given and the rest are lacking due to experimental errors. We show that this variation is NP-hard to solve exactly, thus making the existence of polynomial-time algorithms for this problem extremely unlikely. Our result answers an open question posed by Pevzner (2000). We then study a maximiza-tion version of Partial Digest where a superset of all pairwise distances is given, with some additional distances due to inaccurate measurements. We show that this maximization version is NP-hard to approximate to within a factor of jDj 1 2 ? for any > 0, where jDj is the number of input distances, which implies that polynomial-time algorithms cannot even guarantee to nd a solution for the problem that comes close to the optimum. Our inapproximability result is tight up to low-order terms as we give a trivial approximation algorithm that achieves a matching approximation ratio. Our optimization variations model two diierent error types that occur in real-life data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Noisy Data Make the Partial Digest Problem NP-hard

The problem to find the coordinates of n points on a line such that the pairwise distances of the points form a given multi-set of n 2 distances is known as Partial Digest problem, which occurs for instance in DNA physical mapping and de novo sequencing of proteins. Although Partial Digest was – as a combinatorial problem – already proposed in the 1930’s, its computational complexity is still u...

متن کامل

Modeling of Partial Digest Problem as a Network flows problem

Restriction Site Mapping is one of the interesting tasks in Computational Biology. A DNA strand can be thought of as a string on the letters A, T, C, and G. When a particular restriction enzyme is added to a DNA solution, the DNA is cut at particular restriction sites. The goal of the restriction site mapping is to determine the location of every site for a given enzyme. In partial digest metho...

متن کامل

Measurement Errors Make the Partial Digest Problem NP-Hard

The Partial Digest problem asks for the coordinates of m points on a line such that the pairwise distances of the points form a given

متن کامل

Double Digest Revisited: Complexity and Approximability in the Presence of Noisy Data

We revisit the double digest problem, which occurs in sequencing of large DNA strings and consists of reconstructing the relative positions of cut sites from two different enzymes: we rst show that double digest is strongly NP-complete, improving previous results that only showed weak NP-completeness. Even the (experimentally more meaningful) variation in which we disallow coincident cut sites ...

متن کامل

A Continuous Optimization Model for Partial Digest Problem

The pupose of this paper is modeling of Partial Digest Problem (PDP) as a mathematical programming problem. In this paper we present a new viewpoint of PDP. We formulate the PDP as a continuous optimization problem and develope a method to solve this problem. Finally we constract a linear programming model for the problem with an additional constraint. This later model can be solved by the simp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007